Optimal Performance of Feedback Control Systems 
with Limited Communication over Noisy Channels 
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Abstract — A discrete time stochastic feedback control system 
with a noisy communication channel between the sensor and 
the controller is considered. The sensor has limited memory. 
At each time, the sensor transmits encoded symbol over the 
channel and updates its memory. The controller receives a noisy 
version of the transmitted symbol, and generates a control action 
based on all its past observations and actions. This control 
action action is fed back into the system. At each stage the 
system incurs an instantaneous cost depending on the state of 
the plant and the control action. The objective is to choose 
encoding, memory updating and control strategies to minimize 
the expected total costs over a finite horizon, or the expected 
discounted cost over an infinite horizon, or the expected average 
cost per unit time over an infinite horizon. For each case we 
obtain a sequential decomposition of the optimization problem. 
The results are extended to the case when the sensor makes an 
imperfect observation of the state of the system. 

I. Introduction 

Recent advances in network and communication technologies 
have led to an increasing interest in networked control systems 
(NCS) (see the papers in [1]), in particular, the limitations 
imposed upon feedback control by the presence of a communi- 
cation channel in the loop. Most researchers have concentrated 
on stability analysis of the system. The problem of stabiliza- 
tion of a plant with finite data rate feedback was investigated 
in [2-15]. LQG stability of deterministic and stochastic sys- 
tems under various communication constraints (rate limited 
channels, noisy channels with input power constraint, etc.) 
was considered in [16-22]. Performance limitation in terms 
of lower bounds on the separation of differential entropy rates 
was investigated in [23], [24]. However certain applications 
require performance metrics more general than asymptotic 
metrics of stability and separation of differential entropy. 
In this paper we consider the class of additive performance 
metrics, where the total cost is the sum of costs along the 
entire path. 

In problems with asymptotic performance metrics, transient 
behavior need not be optimal, thus strong performance bounds 
can be derived by using asymptotic results from probability 
theory, information theory and classical control theory. How- 
ever, in problems with more general performance metrics, 
transient behavior needs to be optimal. To the best of our 
knowledge, performance analysis of such problems has not 
been addressed in the literature. We identify algorithms to 
obtain optimal strategies; but we have not been able to find 
expressions for optimal performance or performance bounds. 



We consider a discrete-time feedback control system with a 
communication channel between the sensor and the controller, 
shown in |Figure l| Such problems arise when the plant and 
the controller are geographically separated. We are interested 
in problems in which the sensor has limited resources while 
the controller has no resource constraint; we model the sensor 
as an encoder with finite memory, the channel between the 
sensor and the controller as noisy, and the channel between 
the controller and the system as noiseless 1 . At each stage 
the system incurs an instantaneous cost depending on the 
state of the plant and the control action. The objective is 
to choose encoding, memory updating and control strategies 
to minimize the expected total cost over a finite horizon, or 
expected discounted cost over an infinite horizon, or expected 
average cost per unit time over an infinite horizon. 

The key contribution of this paper is providing a method- 
ology for determining jointly optimal real-time encoding, 
memory updating and control strategies for feedback control 
systems with limited communication over noisy channels. The 
methodology applies to general non-linear stochastic systems, 
with an arbitrary additive performance criteria. 

The remainder of this paper is organized as follows. We 
formulate the performance analysis of feedback control sys- 
tems with limited communication over noisy channels as a 
decentralized stochastic optimization problem. To illustrate 
the key concepts associated with our solution methodology 
we first consider in ISection ill the finite horizon problem, for 
which we establish structural results of optimal controller and 
present a methodology for joint optimization of the encoding, 
memory updating and control strategies. In ISection fill we 
extend the methodology to infinite horizon problems. We 
discuss computational issues for obtaining numerical solution 
of the dynamic programming algorithms of finite and infinite 
horizon problems in lSection IVI The feedback control problem 
when the encoder has imperfect observation of the state of the 
plant is considered in ISection VI We conclude in ISection VTI 
Notation: We use uppercase letters (X, Y, Z) to 
denote random variables and lowercase letters to de- 
note their realizations (x, y, z). When we represent a 
function of random variables as a random variable 

(Px t ,Mt,Px t+1 ,M t ,Px t+1 ,M t )> a tilde aboVe the variable de " 

'in the sequel we show that this assumption does not entail any loss of 
generality. 



notes its realization {Px t ,M t , Px t+1 ,M t , Px t+lMt )- When we 
use Greek letters to represent a random variable (ir c , it 1 , it 9 ), 
a tilde above the variable denotes its realization (7r c , 7t z , 7r s ). 
We also use the short hand notation of x l to present the 
sequence x\ 1 . . . , Xt and similar notation for random variables 
and functions. 

II. The Finite Horizon Problem 
A. Problem Formulation 



Fig. 1. Feedback control system with noisy communication 

Consider a discrete time feedback control system as shown 
in |Figure~T1 which operates for a horizon T. The state evolution 
is given by 

X t+1 =f(X t ,U t ,W t ), (i) 

where / is the system evolution function. The variables 
Xt,Ut,Wt denote the state of the system, the control action 
and the plant disturbance respectively, at time t. We assume 
that all variables are discrete. For all t, X t takes values in 
X = {1,2,..., \X\}, U t takes values in U = {1, 2, . . . , \U\} 
and Wt takes values inW = {l,2,...,|W|}. The initial state 
X\ is a random variable with PMF Px x ■ The random variables 
W\, . . . , Wt are i.i.d. with PMF Pyv and are also independent 
of Xi. 

The sensor, consisting of an encoder and a memory, makes 
perfect observations of the state of the system. At each time 
instant t, the encoder generates an encoded symbol Z t , taking 
values in Z = {1, . . . , \Z\}, as follows 



Z t = c t {X t ,M t _ x ), 



(2) 



where ct is the encoding function at time t and M t -\ denotes 
the content of the sensor's memory at t — 1. Al t takes values 
in.M = {l,...,|.M|} and is updated according to 



M t = l t (Xt,M t _ 1 ), 



(3) 



where l t is the memory update function at time t. Observe 
that the sensor has a finite size memory and though it makes 
perfect observations of the state of the system, it can not store 
all the past observations. Thus, it does not have perfect recall 
and at each stage it must selectively shed information. 

The encoded symbol Z t is transmitted over a noisy com- 
munication channel and a channel output Y t is generated 
according to 

Y t = h(Z t ,N t ), (4) 

where h is the channel and N t denotes the channel noise. Y t 
takes values in3^ = {l,---,|[V|} and N t takes values in M = 
{1, . . . , The sequence of random variables Ni,. . . , Nt 
is i.i.d. with given PMF P/y. N\,..., Nt are also independent 
of Xx,Wx,...,W T . 

The controller observes the channel outputs and generates 
a control action Ut as follows 



where g t is the control law at time t. Ut takes values in U = 
{1, . . . , \U\}. A uniformly bounded cost function p : X x U — > 
[0, If], K < oo is given. At each t, an instantaneous cost 
p(X t ,U t ) is incurred. 

The collection (X,W,M,Z,M,y,U,P Xl ,Pw,PN,f,h, 
p, T) is called a perfect observation system. The choice of 
(c,l,g), c= (ci,...,c T ), I = (h,...,l T ), 9 = (gi,...,g T ), 
is called a design. 

The performance of a design, quantified by the expected 
total cost under that design, is given by 

J T (c,l,g) = ^J2p(X t ,U t ) c,l,g^, (6) 

where the expectation in © is with respect to a joint measure 
on (Xi, . . . , Xt, Ui, . . . , Ut) generated by Pyy, Pn,J, h and 
the choice of design (c, I , g). We are interested in the following 
optimization problem: 

Problem 1: Given a perfect observation system (X , W, 
M,Z,M ' ,y ,U,Px x ,Pw ,Pn , f ,h, p,T), choose a design 
(c*,l*,g*) such that 



J T (c*,l*,g*) = J T = min 



Mc,l,g), (7) 



where C (? T = ■ ■ ■ X ^ (T times), c (? is the space of functions 
from X x M to Z, ^ T = % x • • • x if (T times), if is the 
space of functions from X x M to M, ^ T = X • • • X ^r, 
and 5f t is the space of functions from y t x U 1 ^ 1 to hi. 

a) Remark: There is no loss of generality in assuming 
a noiseless feedback channel. Suppose there is noise in the 
feedback channel, and the input to the system is Ut is a noisy 
version of Ut given by 



Ut = h(Ut,N t ) 



(8) 



Ut=gt{Y t ,U t ~ 1 ), 



(5) 



where h is the feedback channel and Nt denotes the noise 
in the feedback channel. Ni,. .. , Nt is a sequence of inde- 
pendent variables that is also independent of Xi , Wi . . . , Wt 
and Ni,...,Nt- This model can be transformed into one 
equivalent to Q-Q by setting 

W t = (W t ,N t ), (9) 

X t+ i = f (x t , h(Ut,N t ), W t ) = f(X t , U t , W t ). (10) 

Thus, without loss of generality we can assume a noiseless 
feedback channel. 

B. Salient Features of the Problem 

IProblemTI is a decentralized multi-agent stochastic opti- 
mization problem. The agents — the sensor and the controller — 
share a common objective of minimizing the expected total 
cost. They have access to different (and non-nested) informa- 
tion about underlying state of nature. Furthermore, the actions 
taken by an agent at any instant of time affects the observations 
of the other agent at future time instants. Thus the problem is 
a sequential (in the sense of [25]) dynamic team with strictly 
non-classical information structure [26]. Dynamic teams are, 
in general, functional optimization problems having a complex 



interdependence among the decision rules [27]. This interde- 
pendence leads to non-convex (in policy space) optimization 
problems that are hard to solve (see [28] for an example). 
Identifying an information state sufficient for performance 
evaluation [29], [30] is a key step in obtaining a sequential 
decomposition of such problems. To obtain a sequential de- 
composition of Problem II we proceed as follows. First, we 
derive structural properties of optimal controllers. Using these 
structural results we transform Problem 1 1 into an equivalent 
optimization problem and identify information states for this 
equivalent problem. This yields a sequential decomposition for 
IProblemTI along with a dynamic programming algorithm to 
obtain an optimal design. 



Now consider 



Pr (X t+1 =x,M t = m,U t = u t \y t ,u t - 1 ,c t ,l t ,g t ) 
= Vr{X t = x u M t = m\y t ,u t - 1 ,c\l\g t ) 



x t ex 



x Pr (U t = in \y\ u*- 1 , c\ l\ g\x t ,M t = m) 
x Pr (X t+1 =x\x t ,M t = m,y t ,u t ,c t ,l t ,g t ) 



(a) 



Pr (u t |?AZ*-\ fft 



x ]T [Pr(X t =x t ,M t =m\y t ,u t - 1 ,c t ,l t ,g t ~ 1 ) 

x t £X 

x Pr(X t+1 = x\x t ,u t ) 

= Pr(u t \y t ,l t -\g t ) 

x Y \Px t ,M t (xt, m) Pr (X t+1 = x | x t ,u t ) 



C. Structural Results 



(18) 



In this section we present structural properties of optimal 
controllers. For this purpose we define the following. 

Definition 1: Let Px t ,M t , Px t+1 ,M t and Px t+1 , Mt be ran " 
dom vectors defined as follows: 



where (a) follows from Q and (|5}- Combine fl!4i and dl 81 i 

and cancel Pr (u t \y*, I 1-1 ,gt) from the numerator and the 
denominator of d!4i . giving 



Px t+u M t = lp(Px t ,M t ,U t ), 



(19) 



Px t ,Mt( X > m ) 

= Pr (X t =x,M t =m\Y t ,U t - 1 ,c t ,l\g t - 1 ),(U) 
Px t+1 ,M t {x, m) 

= Pr(X t+1 =x,M t =m\Y\U t 7 c t ,l t ,g t )A12) 
P x t+ t,M t (x,Tn) 

-Pr (X t+ i = x, M t = m\Y t+1 , U\c t+1 ,l\ g l ). (13) 

For any particular realization y f ,u t ^ 1 and arbitrary (but fixed) 
choice of c*, I* and the realization of Px t ,M t , denoted 

by Px t ,M t < is a PMF on i x t,M t ). If (y*, U*- 1 ) is a random 
vector and c* , Z* , <7* — 1 are arbitrary (but fixed) functions, then 
Px t ,Mt * s a random vector belonging to ~p XxM ^ the space of 
PMFs on X x Ai. Similar interpretations hold for Px t +i,M t 
and Pxt +u Mf 

These beliefs given by IDefinition II are related as follows: 

Lemma 1: For each stage t, there exists a deterministic 
functions if), <fi, and v such that 



Px t+u M*=ip{Px t ,M t ,U t ), (14) 

P X t+1 M t = <t>{Px t+u Mt,Ct+l), (15) 

Px t+1 ,M t+1 =v(P^ t+uUt ,k+i)- (16) 

Proof: Consider a component of Px t+1 ,M t , given by (I14> 
at the top of the page. 



where i/j is given by (114-1 and Jl 81 . 

Consider a component of Px t 1 M t > gi yen by 1.171 on the 
top of the page. 

Now consider, 

Pr (Xt+i =x,M t =m,Y t+1 = y t+ i|y* ,u t , c t+1 ,i* , g*) 
= Pr (X t+1 =x,M t =m\y t ,u t ,c t+ \l t ,g t ) 
x Pr(y t+1 = y t+1 | y\u\c t+ \ 

l\g\X t+x = x,M t = m) 

( = 5 Pr (X t+ i = x, M t =m\y\ u\ c\ I t ,g t ) 
x Pr (Y t +i = y t+ i\X t+ i = x, M t = m, c t +i) 

= Px t+u M t (x,m) 

xPr(r i+1 =y t+1 \Xt+i =x,M t = m,ct+i), (21) 



where (6) follows from (|2} and (0}. Combining d!7i and (12 1 1 

we have 



P x t+u M t - <t>(Px t+1 ,M t ,Ct+i), 



where <p(-) is given by < l 1 71 and ( 12 li . 



(22) 



Pxi +u M t {x,m) 



Pr (X t+1 =x,M t =m,U t = u t \y t ,u t - 1 ,c t ,l t ,g t ) 
E Pr {Xt+i = x', M t = m>, U t = u t \ y\ u*" 1 , c\ l\ g*) ' 

x' ,m' xM 



(14) 



P X t+1 M t ( x > m ) = 



Pr (X t+1 =x,M t =rn,Y t+1 =y t+1 |y t ,u t ,c t+1 ,Z t , 5 t ) 



E 

a;' m'E^xM 



Pr (X t+1 = x', M t = mf, Y t+1 = y t+ i\y\ u\ c t+1 ,l\g^ 



(17) 



Consider a component of P Xt+1 . 



Px t+u M t+ i{x,m) 



+ i,M t+ i. 



t + 1 t t + 1 ;t+l 



Pr (JTt+i = x, M t +i =m\y + ,u,c 
]T Pr(X t+1 =x,M t = m t \y t+1 ,u t ,c t+1 ,l t+1 ,g t ) 

m t eM 

x Pr(M t+ i = m | X t+ i = at, Af« = m t , y t+ \ u\ 



Jfl , u *, c * +1 ,; t , ff t ) 



^ Pr (^t+i = x, M t = m t \y 

m t €.M 

x Pr (M t+1 = m\X t+1 = x,M t = m t ,l t+1 ) 
x ^ p x t+1 ,Mt( x > m t) 1 [m = h+i{x,m t )] 



m t eM 



The above relationships between the controller's beliefs lead 
to the structural results of the optimal controllers. 

Theorem 1: Consider IProblemTI for any arbitrary (but 
fixed) encoding and memory update strategies c and I, re- 
spectively. Then, without loss of optimality, we can restrict 
attention to control laws of the form 

U t = g t (Px u Mt). (24) 

Proof: Equations J 1 4I> — d 1 6ft of ILemma Tl can be com- 
bined to obtain 

Px t+x ,M t+ x = V {(j>(ll>{Px t Mti~Ut),Ct + l),lt+l) 

= f-i(Pxt,Mf,Ut,Ct+i,h+i)- (25) 

Thus for any fixed c and I, Px t ,M t is a controlled Markov 
process with control action Ut- Further, the expected instanta- 
neous cost can be written as 

E{ ^JQ,^) | y*y,c<, /*,<?<} 

p(x t ,u t )Px u Mt( x t,m t ) 



E 



= p{ p x u M t ,ut)- (26) 

There is a subtle technicality in the first step of ( I26> . See [31] 
for details. Hence we have a perfectly observed stochastic 
process {Px t ,M t , t = 1,...,T} with control action Ut 
and instantaneous cost p{Px t ,M t , Ut). From Markov decision 
theory [30] we know that there is no loss of optimality in 
restricting attention to control laws of the form (I24> . ■ 



1) Implication of the structural results: ITheorem ll implies 
that at each stage t, without loss of optimality, we can restrict 
attention to controllers belonging to the family % of functions 
from j> XxM to U. Thus at each stage we can optimize over 
a fixed (rather than time-varying) domain. Thus IProblemTI is 
equivalent to the following problem: 

Problem 2: Given a perfect observation system (X, W, 
M,Z,J\f,y,U,P Xl ,Pw,PN, f,h,p,T), choose a design 
(c*,l*,g*) that is optimal with respect to the performance 
criterion of 0, i.e., 



J T (c*,l*,g*) = = min 



J T (c,l,g), (27) 



= v(Px t+ i,M t Jt+i), (23) where 

where (c) follows from and 1 [ • ] is the indicator function 



?T A 



c • • • x % (T times). 

Thus we have an optimization problem in which the action 
space is not changing with time. In the next section we provide 
a sequential decomposition of IProblem 21 



D. Joint Optimization 

In this section, we identify information states sufficient 
for performance evaluation of IProblem 21 resulting in its se- 
quential decomposition. IProblem 2l is equivalent to IProblem II 
hence we also obtain a sequential decomposition of lProblem II 
The intuition behind our approach is as follows. As mentioned 
in ISection II-BI the agents act in a sequential manner. Let 
7T(,7r|,7rf be the information states of the encoder, memory 
update and controller respectively. For these to be valid states, 
they must satisfy the property 



q at 



't+1 



that is, at each time instant t, tt\ can be determined from 7T( 
and c f , 7rf can be determined from n\ and l t , and 7if +1 can 
be determined from 7rf and g t . This ensures that 7rj,7Tj,7rf 
are information states in the sense of [30]. However, a system 
can have more than one information state, and not all of them 
are sufficient for performance evaluation (see [29]). To be 
sufficient for performance evaluation, the information states 
must absorb/summarize the effect of past decision rules on 



the expected future cost, that is, 

T 

c,l,9 




c T it T 

K t ,c t ,l t ,g t 



ir l r T 7* n T I 

TV 9 r T 7* n T 
n t i c t+i> l t+iiHt 



(28) 



Furthermore, to extend the results of the finite horizon problem 
to infinite horizon problems, we want the domain of informa- 
tion states to be time-invariant. 

The following information states satisfy the above require- 
ments. 

Definition 2: Let II be the space of probability measure on 
X xMxV XxM . Define 7if, tt|, tt? , t = 1, . . . , T, as follows: 

1) Tif = Pr(* t) Mt_i,P Xt) M- t _ 1 ). 

2) 4=Pr(X t ,M t - 1 ,P+ Mt _J. 



3) Trf =Pi{X t ,M t ,P Xt 



Mt, 



The unconditional PMFs ir^ , ir l t , tt 9 defined above are informa- 



tion states sufficient for performance evaluation of lProblem 21 
Specifically, they satisfy the following properties: 

Lemma 2: irf, TT t ,ir B are information states for the encoder, 
the memory update and the controller respectively, i.e., 

1) there exist linear transformations Q c (ct), Q l (h), and 
Q 9 (g t ) such that 



4 = q c (qK, 

*f = Q l (k)4, 
Q 9 (g t )n?. 



(29) 
(30) 
(31) 



2) the conditional expected instantaneous cost can be ex- 
pressed as 

^{p{X u U t )\c t ,l\g t }=~p{-Klg t ), (32) 

where p is a deterministic function. 
Proof: This follows from lLemma ll and lDehnition 21 See 
[31] for details. ■ 
Using this result the performance criterion of (jfji can be 
rewritten as 



e\ J2p(X t ,U t ) 



c,l,g 



= Y d ^{p( x t,Ut)\c\l t ,g t } 
t=i 

T 

±Y,~P(-Kl9t), (33) 



t=i 



where the sequence {tt 9 , . . . , Trf,} depends on the choice 
of (c,l,g). Hence. [Problem 21 is equivalent to the following 
deterministic problem: 



Problem 3: Consider a deterministic system with states 
7T( , 71"!, 7rf . The initial state -k\ is known and the t > 1, the 
system evolves as follows, 



T 9 - r>i 



Q l (h)4, 
Q 9 {gt)* 9 t , 



(34) 
(35) 
(36) 



where Ct,lt,gt belong to ^ _£f, respectively and 
Q c ,Q l ,Q 9 are known linear transformations. At time t, 
an instantaneous cost p(7rf,<? t ) is incurred. 

The optimization problem is to determine design (c,l,g), 
where c = (ci, . . . , cy), 7 = (7i, . . . , 7t), and 5 = 
(<7i, . . . , gr), to minimize the total cost over horizon T, i.e., 



(37) 

(cd,g)e^" T x^ T ■ ■ 

This is a classical deterministic optimal control problem; 
optimal functions (c*,l*,g*) are determined as follows: 

Theorem 2: An optimal design (c*,l*,g*) for IProblem 31 
(and consequently for IProblem 21 and thereby for IProblem H 
is given the following nested optimality equations: 



V$(tt 9 )= inf p(tt 9 , 9t ) 



and for t = 1, 



mm 



mm 



u/(g c (c 4 )7T c ), 

mV t 9 (Q l (l t )n l ), 



V?(ir 9 ) = inf p^St) + V t c +1 (Q 9 {g t )v 9 



(38) 

(39) 
(40) 
(41) 



The argmin (or arginf) at each step determines the cor- 
responding optimal design for that stage. Furthermore, the 
optimal performance is given by 

J* = V^K). (42) 
Proof: This is a standard result, see [30, Chapter 2]. ■ 

III. Infinite Horizon Problem 

In this section we extend the model of ISection II- Al to an 
infinite horizon (T — * 00) using two performance criteria: 

1) Expected Discounted Cost where the performance of a 
design is determined by 



JP(c,l,g) = -Ei Yj^piXuUt) 



t=i 



c,l,g 



(43) 



where < j3 < 1 is called the discount factor. 
2) Average Cost per unit time where the performance of a 
design is determined by 



J(c,l,g) 



lim sup — 

T^oo T 



]p{X t ,Ut, 



c,l,g 



(44) 

We take the lim sup rather than lim as for some designs 
(c,l,g) the limit may not exist. 



Ideally, while implementing a design for infinite horizon 
problems, we would like to use time-invariant designs. This 
motivates the following definition. 

Definition 3: A design (c,l,g), c = (ci, C2, . . . ), I = 
(li, I2, . . . ), g = (gx, p2> • • • ) is called stationary (or time- 
invariant) if ci = c-2 = ■■■ = c,li = I2 = ■■■ = l,gi = 
92 = ■ ■ ■ = g- 

Due to the dynamic team nature of the problem, it is not 
immediately clear whether there exist stationary designs that 
are optimal (or e-optimal). In this section we show that for the 
expected discounted cost problem, without loss of optimality, 
there exist stationary design that are optimal; for the average 
cost per unit time problem, under certain conditions, there exist 
stationary designs that are e-optimal. 

A. Expected Discounted Cost Problem 

Consider the infinite horizon problem with expected dis- 
counted cost criterion given by J43i . For this problem the 
relations of ILemma Tl hold, hence the structural result of 
ITheoremTl is valid, and we can restrict attention to encoders 
belonging to <£ s . Define ir^ , ir l t , wf as in lDefinition 2llLemma 21 
can be proved as before. The transformations Q c , Q l , Q 9 and 
the expected instantaneous cost p are the same as in the finite 
horizon case. Hence, the infinite horizon problem with the 
expected discounted cost criterion given by J43I is equivalent 
to Problem "3l with the optimization criterion given by 



jP{c,l,g)±W,\ V^W.ft) 



c,l,g 



(45) 



For this problem we have the following result: 
Theorem 3: For the infinite horizon expected discounted 
cost problem with the performance criterion given by J43> . 
without loss of optimality, one can restrict attention to station- 
ary designs. Specifically, for any optimal design (c', I', <?') there 
exists a stationary design (cg°, Zq°, cfj° = (c ,c ,...), 
'o° - Go, lo, ■ ■ ■ ), and g^ = (g , go, ■ ■ ■ ), such that 

V(n{) = jP(c^,l^,g^) = jP(c',l',g% (46) 

where V is the unique uniformly bounded fixed point of 

V(tr)= min p(Q(c,l)7r,g)+l3v(Q(c,l,g)(tr)), 

(47) 

with 

Q(c,l)^Q l (l)oQ c (c), (48) 
Q(cJ,g)^Q 9 (g)oQ l (l)oQ c (c), (49) 

and (c ^o,So) satisfy 

F(tt) = p (Q(co, Io)k, go) + 13V (q(c , lo, go)(nj) ■ (50) 
Proof: See [31]. ■ 



B. Average Cost per unit time Problem 

Consider the infinite horizon problem with average cost per 
unit time criterion given by J44i . For this problem the relations 
of ILemma Tl hold, hence the structural result of ITheoremTl 
is valid, and we can restrict attention to encoders belonging 
to Define tt^ttI^tt^ as in IDefinition 21 ILemma 21 can be 
proved as before. The transformations Q c ,Q l ,Q 9 and the 
expected instantaneous cost p are the same as in the finite 
horizon case. Hence, the infinite horizon expected discounted 
cost problem is equivalent to lProblem 31 with the optimization 
criterion given by 

J(c,l,g) =limsup^Ei yVTrf,^) c,l,g }. (51) 



For this problem we have the following result: 

Theorem 4: For the infinite horizon average cost per unit 
time problem with the performance criterion given by ( 15 1> . 
assume 

(Al) for any e > there exist bounded measurable 



functions v(-) and r(-) and design (cq,Iot9o) £ 
such that for all tt, 



x j£f x 



v(ir) = min v (Q(c,l,q)w 
= v (Q(c ,l ,g )irj , 



(52) 



and 



mm p 



(Q(c, 1)tt, g) +r (Q(c,l,g)n) 



< 



< v(n)+r(7r) < p [Q(c , l )n, go) +r [Q(c , lo, ffoVJ +£■ 

(53) 

Then for any horizon T and any design (c', g 1 ) for that hori- 



zon, the stationary design (cg°, Zg° , <?g°), 



(co,c , 



i§° = Go, lo,--- ), 5o° = (5o, go,--- ), satisfies 

M$, ll, 9o) = KO + r«(7r?) < J T {c', l', g') + e, (54) 

where a T = (a, . . . , a) (T times) for a — Co,lo,go- Further 
under (Al), ( I54t is equivalent to 



j(c%>,lo*,go K ') = v«)< J{c",l",g"), 

where (c",l",g") is any infinite horizon policy and 



(55) 



J(cVV0 = ljmin4y>| 



t=i 



;(QW,l»)n,rf). (56) 

Proof: See [31]. 

C. Implication of the Result 

We have shown that there exist optimal stationary designs 
for the infinite horizon expected discounted cost problem and, 
under certain conditions, e-optimal stationary designs for the 
infinite horizon average cost per unit time problem. This 
simplifies the off-line optimization problem since we have to 
choose the best amongst x Jz? x % stationary strategies, 
rather than to choose the best amongst x _S? °° x 5f s °° time- 
varying strategies. Further, implementing stationary strategies 



involves implementing one function at each agent which is 
much simpler than implementing a time-varying strategy. 

IV. Computational Issues 

The dynamic program of lTheorem 21 for joint optimization 
of encoding, memory updating and control strategies is similar 
to a dynamic program for partially observed Markov decision 
problems (POMDP) with uncountable state space and uncount- 
able action space. The information state ir^ belongs to II, the 
space of probability measures on X x M. x p XxM > which is a 
subset of probability measures on R d , withd= x + 
The action spaces & and Jzf are finite while the action space 
@S is uncountable. Therefore, in the dynamic program of 
ITheorem~2l the information state belongs to the space of 
probability measures on a finite dimensional Euclidean space 
and an uncountable state space. The standard computational 
techniques for solving such POMDPs can be used to obtain 
numerical results. 

It is the off-line computation of an optimal design that has 
exponential complexity. The on-line implementation is simple 
as we need to implement a stationary design. 

V. Sensors with Imperfect Observations 

So far we have assumed that the sensor perfectly observes 
the state of the system. However in many practical systems, the 
sensor observations are noisy due to external disturbances and 
the intrinsic noise in the measurement hardware. In this section 
we model this scenario and show that noisy observations by 
the sensor do not alter the nature of the problem. We first 
consider the finite horizon case. 

A. Problem Formulation 

Fig. 2. Feedback control system with noisy communication and imperfect 
observations. 

Consider a discrete time imperfect observation system as 
shown in |Figure~2| which operates for T time steps. The state 
of the system X t evolves according to The observations 
St made by the observer at time t are noisy version of the 
state of the system and are given by 

S t =h{X t ,N t ), (57) 

where N t denotes the observation noise and h is the obser- 
vation channel. St takes values in S = {1, . . . , |<S|} and N t 
takes values in N = {1, . . . ,\N\}. The sequence of random 
variables N\, . . ,,Nt are i.i.d. with PMF Pp. Ni,...,N T is 
also independent of X%, W%, . . . , Wt, Nx,..., Nt- 

The sensor is modeled as in ISection II-AI and operates as 
follows 

Zt = Ct(St,Mt-i), (58) 
M t = l t (St,M t _ 1 ). (59) 

All other components of the system (the channel, the con- 
troller and the performance measure) are modeled as in 
ISection II-Al The collection of (X,W,Af,S,M,Z,Af,y,U, 



Px 1 , Pw, Pjyy -PjVi /; Mi h, p, T) is called an imperfect obser- 
vation system. The choice of (c,l,g), c = (cx, . . . , or), 
I = (h, . . . ,It)> 9 = (<?i> • • • iQt), is called a design. The 
performance of a design, quantified by the expected total cost 
under that design, is given by 0. We are interested in the 
following optimization problem: 

Problem 4: Given an imperfect observation system (X, 
W,Af, S, M, Z,Af, y, U, P Xl , Pw, P$, Pn, f, Mi K p, T), 
choose a design (e*, /*,<?*) such that 

Jt(c*,1*,9*) = Jt= min J T (c,l,g), (60) 

where C € T = c € X • • • X ^ (T times), ^ is the space of functions 
from S x M to Z, ^ T = if x • • • x «£f (T times), if is the 
space of functions from S x M to M, Sf T = Sfi x • • • x ^t, 
and ^ t is the space of functions from y l x U 1 ^ 1 to 11. 

Although in IProblem"4l the encoder does not know the 
state of the plant, the problem is conceptually same as 
Problem H and the solution methodology of lProblem II works 
for IProblem 41 with very minor changes. 

B. Structural Results 

In this section we present structural properties of optimal 
controllers. For this purpose define the following: 

Definition 4: Let Px t ,M t and Px t+1 ,M t be defined as in 
IDefinition 21 Define Px t+1 ,s t ,M t as follows: 

Px t+1 ,S t ,Mt( X ' S ' m ) 

= Pr(X t+1 =x,S t = s,M t = m\Y t ,U t ,c t+1 ,l t ). 
These beliefs are related as follows: 

Lemma 3: For each stage t, there exists a deterministic 
functions -0, <f>, and v such that 

P Xt+1 ,M t =^(Px t ,M t iU t ), (61) 

Px t+1 ,s t ,M t = 4>(Px t+1 ,M t ,c t+ i), (62) 

Px t+1 ,M t+1 =v(Px t +i,St,Mtih+i)- (63) 
Proof: This can be proved along the same lines as the 
proof of lLemma II ■ 
Using the above relationship it can be shown that the structural 
result of ITheoremTI also hold for IProblem"4l Thus, without 
loss of optimality, we can restrict attention to controllers of the 
form (I24> . These structural results imply that we can formulate 
a problem equivalent to IProblem 41 with a time invariant action 
space. 

C. Joint Optimization 

We follow the philosophy of ISection II-DI and use the 
structural results of previous section to obtain a sequential 
decomposition for IProblem 41 

Definition 5: Let 7T^,Tr l t ,Trf, t = 1,...,T be defined as 
follows: 

1) 7r t c = Prf.Xt, Mt-uPx^M^)- 

2) 4 - Pr (X ts M t _i s P XtA _ 1 ,M t _ 1 ). 

3) 7T? =Pr{X t ,M t ,P Xt ,M t ). 



ILemma 2l holds for 7rj,7r|,7rf defined above. Thus, the above 
unconditional PMFs are information states sufficient for per- 
formance evaluation of IProblem 41 This can be shown along 
the same lines as the proof of ILemma 2l Hence, IProblem"4l 
is equivalent to a deterministic problem similar to IProblem"3l 
with the transformations Q c ,Q l ,Q 9 appropriately defined. 
The solution of this deterministic problem is given by nested 
optimality equations similar to lTheorem 21 Hence, we obtain a 
sequential decomposition of IProblem 41 Similar results extend 
to infinite horizon problems using the ideas of ISection 1111 

VI. Conclusion 

We have presented a methodology for determining jointly 
optimal encoding and control strategies for feedback control 
systems with limited communication over noisy channel. The 
methodology is applicable to finite horizon problems with 
expected total cost criterion, to infinite horizon problem with 
expected discounted cost criterion, and to infinite horizon 
problem with average cost per unit time criterion. We extend 
this methodology to problem where the encoder/sensor makes 
imperfect observations of the state of the system. The resulting 
optimality equations can be viewed as POMDPs where the 
state space is a real valued vector and the action space 
is uncountable. Hence traditional method for solving such 
POMDPs can be used to obtain a solution for feedback control 
problems with communication constraints. 

The methodology presented here can be used to obtain a 
sequential decomposition of general dynamic team problems 
with non-classical information structures. 
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